In [ ]:
epochs = 10
# We don't use the whole dataset for efficiency purpose, but feel free to increase these numbers
n_train_items = 640
n_test_items = 640
When building Machine Learning as a Service solutions (MLaaS), a company might need to request access to data from other partners to train its model. In health or in finance, both the model and the data are extremely critical: the model parameters is a business asset while data is personal data which is tightly regulated.
In this context, one possible solution is to encrypt both the model and the data and to train the machine learning model over the encrypted values. This guarantees that the company won't access patients medical records for example and that health facilities won't be able to observe the model to which they contribute. Several encryption schemes exist that allow for computation over encrypted data, among which Secure Multi-Party Computation (SMPC), Homomorphic Encryption (FHE/SHE) and Functional Encryption (FE). We will focus here on Multi-Party Computation (which have been introduced in Tutorial 5) which consists of private additive sharing and relies on the crypto protocols SecureNN and SPDZ.
The exact setting of this tutorial is the following: consider that you are the server and you would like to train your model on some data held by $n$ workers. The server secret shares his model and send each share to a worker. The workers also secret share their data and exchange it between them. In the configuration that we will study, there are 2 workers: alice and bob. After exchanging shares, each of them now has one of their own shares, one share of the other worker, and one share of the model. Computation can now start to privately train the model using the appropriate crypto protocols. Once the model is trained, all the shares can be sent back to the server to decrypt it. This is illustrated with the following figure:
To give an example of this process, let's assume alice and bob both hold a part of the MNIST dataset and let's train a model to perform digit classification!
Author:
In [ ]:
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torchvision import datasets, transforms
import time
This class describes all the hyper-parameters for the training. Note that they are all public here.
In [ ]:
class Arguments():
def __init__(self):
self.batch_size = 64
self.test_batch_size = 64
self.epochs = epochs
self.lr = 0.02
self.seed = 1
self.log_interval = 1 # Log info at each batch
self.precision_fractional = 3
args = Arguments()
_ = torch.manual_seed(args.seed)
Here are PySyft imports. We connect to two remote workers that be call alice
and bob
and request another worker called the crypto_provider
who gives all the crypto primitives we may need.
In [ ]:
import syft as sy # import the Pysyft library
hook = sy.TorchHook(torch) # hook PyTorch to add extra functionalities like Federated and Encrypted Learning
# simulation functions
def connect_to_workers(n_workers):
return [
sy.VirtualWorker(hook, id=f"worker{i+1}")
for i in range(n_workers)
]
def connect_to_crypto_provider():
return sy.VirtualWorker(hook, id="crypto_provider")
workers = connect_to_workers(n_workers=2)
crypto_provider = connect_to_crypto_provider()
Here we're using a utility function which simulates the following behaviour: we assume the MNIST dataset is distributed in parts each of which is held by one of our workers. The workers then split their data in batches and secret share their data between each others. The final object returned is an iterable on these secret shared batches, that we call the private data loader. Note that during the process the local worker (so us) never had access to the data.
We obtain as usual a training and testing private dataset, and both the inputs and labels are secret shared.
In [ ]:
def get_private_data_loaders(precision_fractional, workers, crypto_provider):
def one_hot_of(index_tensor):
"""
Transform to one hot tensor
Example:
[0, 3, 9]
=>
[[1., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 1., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 1.]]
"""
onehot_tensor = torch.zeros(*index_tensor.shape, 10) # 10 classes for MNIST
onehot_tensor = onehot_tensor.scatter(1, index_tensor.view(-1, 1), 1)
return onehot_tensor
def secret_share(tensor):
"""
Transform to fixed precision and secret share a tensor
"""
return (
tensor
.fix_precision(precision_fractional=precision_fractional)
.share(*workers, crypto_provider=crypto_provider, requires_grad=True)
)
transformation = transforms.Compose([
transforms.ToTensor(),
transforms.Normalize((0.1307,), (0.3081,))
])
train_loader = torch.utils.data.DataLoader(
datasets.MNIST('../data', train=True, download=True, transform=transformation),
batch_size=args.batch_size
)
private_train_loader = [
(secret_share(data), secret_share(one_hot_of(target)))
for i, (data, target) in enumerate(train_loader)
if i < n_train_items / args.batch_size
]
test_loader = torch.utils.data.DataLoader(
datasets.MNIST('../data', train=False, download=True, transform=transformation),
batch_size=args.test_batch_size
)
private_test_loader = [
(secret_share(data), secret_share(target.float()))
for i, (data, target) in enumerate(test_loader)
if i < n_test_items / args.test_batch_size
]
return private_train_loader, private_test_loader
private_train_loader, private_test_loader = get_private_data_loaders(
precision_fractional=args.precision_fractional,
workers=workers,
crypto_provider=crypto_provider
)
Here is the model that we will use, it's a rather simple one but it has proved to perform reasonably well on MNIST
In [ ]:
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
self.fc1 = nn.Linear(28 * 28, 128)
self.fc2 = nn.Linear(128, 64)
self.fc3 = nn.Linear(64, 10)
def forward(self, x):
x = x.view(-1, 28 * 28)
x = F.relu(self.fc1(x))
x = F.relu(self.fc2(x))
x = self.fc3(x)
return x
In [ ]:
def train(args, model, private_train_loader, optimizer, epoch):
model.train()
for batch_idx, (data, target) in enumerate(private_train_loader): # <-- now it is a private dataset
start_time = time.time()
optimizer.zero_grad()
output = model(data)
# loss = F.nll_loss(output, target) <-- not possible here
batch_size = output.shape[0]
loss = ((output - target)**2).sum().refresh()/batch_size
loss.backward()
optimizer.step()
if batch_idx % args.log_interval == 0:
loss = loss.get().float_precision()
print('Train Epoch: {} [{}/{} ({:.0f}%)]\tLoss: {:.6f}\tTime: {:.3f}s'.format(
epoch, batch_idx * args.batch_size, len(private_train_loader) * args.batch_size,
100. * batch_idx / len(private_train_loader), loss.item(), time.time() - start_time))
The test function does not change!
In [ ]:
def test(args, model, private_test_loader):
model.eval()
test_loss = 0
correct = 0
with torch.no_grad():
for data, target in private_test_loader:
start_time = time.time()
output = model(data)
pred = output.argmax(dim=1)
correct += pred.eq(target.view_as(pred)).sum()
correct = correct.get().float_precision()
print('\nTest set: Accuracy: {}/{} ({:.0f}%)\n'.format(
correct.item(), len(private_test_loader)* args.test_batch_size,
100. * correct.item() / (len(private_test_loader) * args.test_batch_size)))
A few notes about what's happening here. First, we secret share all the model parameters across our workers. Second, we convert optimizer's hyperparameters to fixed precision. Note that we don't need to secret share them because they are public in our context, but as secret shared values live in finite fields we still need to move them in finite fields using .fix_precision
, in order to perform consistently operations like the weight update $W \leftarrow W - \alpha * \Delta W$.
In [ ]:
model = Net()
model = model.fix_precision().share(*workers, crypto_provider=crypto_provider, requires_grad=True)
optimizer = optim.SGD(model.parameters(), lr=args.lr)
optimizer = optimizer.fix_precision()
for epoch in range(1, args.epochs + 1):
train(args, model, private_train_loader, optimizer, epoch)
test(args, model, private_test_loader)
There you are! You just get 75% of accuracy using a tiny fraction of the MNIST dataset, using 100% encrypted training!
First thing is obviously the running time! As you have surely noticed, it is far slower than plain text training. In particular, a iteration over 1 batch of 64 items takes 3.2s while only 13ms in pure PyTorch. Whereas this might seem like a blocker, just recall that here everything happened remotely and in the encrypted world: no single data item has been disclosed. More specifically, the time to process one item is 50ms which is not that bad. The real question is to analyze when encrypted training is needed and when only encrypted prediction is sufficient. 50ms to perform a prediction is completely acceptable in a production-ready scenario for example!
One main bottleneck is the use of costly activation functions: relu activation with SMPC are very costly because it uses private comparison and the SecureNN protocol. As an illustration, if we replace relu with a quadratic activation as it is done in several papers on encrypted computation like CryptoNets, we drop from 3.2s to 1.2s.
As a general rule, the key idea to take away is to encrypt only what's necessary, and this tutorial shows you how simple it can be.
You might wonder how we perform backpropagation and gradient updates although we're working with integers in finite fields. To do so, we have developed a new syft tensor called AutogradTensor. This tutorial used it intensively although you might have not seen it! Let's check this by printing a model's weight:
In [ ]:
model.fc3.bias
And a data item:
In [ ]:
first_batch, input_data = 0, 0
private_train_loader[first_batch][input_data]
As you observe, the AutogradTensor is there! It lives between the torch wrapper and the FixedPrecisionTensor which indicate that the values are now in finite fields. The goal of this AutogradTensor is to store the computation graph when operations are made on encrypted values. This is useful because when calling backward for the backpropagation, this AutogradTensor overrides all the backward functions that are not compatible with encrypted computation and indicates how to compute these gradients. For example, regarding multiplication which is done using the Beaver triples trick, we don't want to differentiate the trick all the more that differentiating a multiplication should be very easy: $\partial_b (a \cdot b) = a \cdot \partial b$. Here is how we describe how to compute these gradients for example:
class MulBackward(GradFunc):
def __init__(self, self_, other):
super().__init__(self, self_, other)
self.self_ = self_
self.other = other
def gradient(self, grad):
grad_self_ = grad * self.other
grad_other = grad * self.self_ if type(self.self_) == type(self.other) else None
return (grad_self_, grad_other)
You can have a look at tensors/interpreters/gradients.py
if you're curious to see how we implemented more gradients.
In terms of computation graph, it means that a copy of the graph remains local and that the server which coordinates the forward pass also provide instructions on how to do the backward pass. This is a completely valid hypothesis in our setting.
Last, let's give a few hints about the security we're achieving here: adversaries that we are considering here are honest but curious: this means that an adversary can't learn anything about the data by running this protocol, but a malicious adversary could still deviate from the protocol and for example try to corrupt the shares to sabotage the computation. Security against malicious adversaries in such SMPC computations including private comparison is still an open problem.
In addition, even if Secure Multi-Party Computation ensures that training data wasn't accessed, many threats from the plain text world are still present here. For example, as you can make request to the model (in the context of MLaaS), you can get predictions which might disclose information about the training dataset. In particular you don't have any protection against membership attacks, a common attack on machine learning services where the adversary wants to determine if a specific item was used in the dataset. Besides this, other attacks such as unintended memorization processes (models learning specific feature about a data item), model inversion or extraction are still possible.
One general solution which is effective for many of the threats mentioned above is to add Differential Privacy. It can be nicely combined with Secure Multi-Party Computation and can provide very interesting security guarantees. We're currently working on several implementations and hope to propose an example that combines both shortly!
As you have seen, training a model using SMPC is not complicated from a code point of view, even we use rather complex objects under the hood. With this in mind, you should now analyse your use-cases to see when encrypted computation is needed either for training or for evaluation. If encrypted computation is much slower in general, it can also be used carefully so as to reduce the overall computation overhead.
If you enjoyed this and would like to join the movement toward privacy preserving, decentralized ownership of AI and the AI supply chain (data), you can do so in the following ways!
The easiest way to help our community is just by starring the repositories! This helps raise awareness of the cool tools we're building.
We made really nice tutorials to get a better understanding of what Federated and Privacy-Preserving Learning should look like and how we are building the bricks for this to happen.
The best way to keep up to date on the latest advancements is to join our community!
The best way to contribute to our community is to become a code contributor! If you want to start "one off" mini-projects, you can go to PySyft GitHub Issues page and search for issues marked Good First Issue
.
If you don't have time to contribute to our codebase, but would still like to lend support, you can also become a Backer on our Open Collective. All donations go toward our web hosting and other community expenses such as hackathons and meetups!
In [ ]:
In [ ]: